Skip to content

feat(ingest): add structured log category #14229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 1, 2025

Conversation

anshbansal
Copy link
Collaborator

Idea is that sometimes our log messages are not helpful for self-serve. Adding a type to the message tells everyone that it is related to LINEAGE. We can decide on more types in the future and slowly add this log_type in our logging so it is easier to self-serve understand the impact of various logs. This should be driven by where folks are unable to self-serve using our log messages.

Just added in mock data source for now to show + Fivetran where I recently faced this so I know it affects lineage.

It shows in logs like this

datahub ingest -c ../tmp/datahub_mock_data.dhub.yaml
[2025-07-25 18:31:16,384] INFO     {datahub.cli.ingest_cli:151} - DataHub CLI version: unavailable (installed in develop mode)
[2025-07-25 18:31:16,410] INFO     {datahub.ingestion.run.pipeline:225} - Sink configured successfully. 
[2025-07-25 18:31:16,503] INFO     {datahub.ingestion.run.pipeline:254} - Source configured successfully.
[2025-07-25 18:31:16,503] INFO     {datahub.cli.ingest_cli:132} - Starting metadata ingestion
|[2025-07-25 18:31:18,046] ERROR    {datahub.ingestion.source.mock_data.datahub_mock_data:167} - Test Error: This is test error message => This is test error 0
[2025-07-25 18:31:18,047] WARNING  {datahub.ingestion.source.mock_data.datahub_mock_data:175} - Test Warning: This is test warning => This is test warning 0
[2025-07-25 18:31:18,050] INFO     {datahub.cli.ingest_cli:145} - Finished metadata ingestion
\
Cli report:
{'cli_version': 'unavailable (installed in develop mode)',
 'cli_entry_location': '/Users/aseembansal/code/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py',
 'models_version': 'bundled',
 'py_version': '3.10.16 (main, Jan 18 2025, 09:48:57) [Clang 16.0.0 (clang-1600.0.26.6)]',
 'py_exec_path': '/Users/aseembansal/code/datahub/metadata-ingestion/venv/bin/python3',
 'os_details': 'macOS-15.5-arm64-arm-64bit',
 'mem_info': '122.62 MB',
 'peak_memory_usage': '122.62 MB',
 'disk_info': {'total': '994.66 GB', 'used': '325.72 GB', 'used_initally': '325.72 GB', 'free': '668.94 GB'},
 'peak_disk_usage': '325.72 GB',
 'thread_count': 4,
 'peak_thread_count': 4}
Source (datahub-mock-data) report:
{'aspects': {},
 'samples': {},
 'event_not_produced_warn': True,
 'events_produced': 0,
 'events_produced_per_sec': 0,
 'start_time': '2025-07-25 18:31:16.503604 (1.92 seconds ago)',
 'running_time': '1.92 seconds',
 'failures': [{'title': 'Test Error', 'message': 'This is test error message', 'context': ['This is test error 0']}],
 'warnings': [{'title': 'Test Warning', 'message': 'This is test warning', 'context': ['This is test warning 0'], 'log_type': 'LINEAGE'}],
 'infos': []}
Sink (console) report:
{'total_records_written': 0,
 'records_written_per_second': 0,
 'warnings': [],
 'failures': [],
 'start_time': '2025-07-25 18:31:16.410185 (2.01 seconds ago)',
 'current_time': '2025-07-25 18:31:18.424909 (now)',
 'total_duration_in_seconds': 2.01}

Pipeline finished with at least 1 failures; produced 0 events in 1.92 seconds.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 25, 2025
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jul 25, 2025
@github-actions github-actions bot requested a deployment to datahub-wheels (Preview) July 25, 2025 13:11 Abandoned
@github-actions github-actions bot requested a deployment to datahub-wheels (Preview) July 25, 2025 13:15 Abandoned
@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jul 25, 2025
@@ -231,9 +257,19 @@ def warning(
title: Optional[LiteralString] = None,
exc: Optional[BaseException] = None,
log: bool = True,
log_type: Optional[StructuredLogType] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename to log_category, since type would make me think that it would be warn/error/etc

@anshbansal anshbansal changed the title feat(ingest): add structured log type feat(ingest): add structured log category Jul 31, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Jul 31, 2025
@anshbansal anshbansal force-pushed the ab-2025-jul-25-add-log-type branch from 8c84882 to 5e4cc41 Compare July 31, 2025 12:52
@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jul 31, 2025
@anshbansal anshbansal merged commit 0f44d60 into master Aug 1, 2025
60 checks passed
@anshbansal anshbansal deleted the ab-2025-jul-25-add-log-type branch August 1, 2025 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants